Options#

Delimiter#

delimiter will set the delimiter for all tables. If not given it will be detected and if not will default to ','. Will use value in datapackage.json instead if exists. Can only be a single character.

Example:

from csvs_convert import csvs_to_sqlite
csvs_to_sqlite('output.db', ['data.csv', 'data2.csv'], delimiter=",")

Quote#

quote will set the quote character for all tables. By default will try and be detected and if not will default to '"'. Will use value in datapackage.json instead if exists. Can only be a single character. Not for parquet conversion.

Example:

from csvs_convert import csvs_to_sqlite
csvs_to_sqlite(f'output.db', ['data.csv', 'data2.csv'], quote='"')

Delete Input CSV#

delete_input_csv deletes input CSVS after converting them.

Example:

from csvs_convert import csvs_to_sqlite
csvs_to_sqlite(f'output.db', ['data.csv', 'data2.csv'], delete_input_csv=True)

Drop#

drop when inserting into a database, if a table with that name exists, drop the table before loading the new data. SQLITE and POSTGRES only.

Example:

from csvs_convert import csvs_to_sqlite
csvs_to_sqlite(f'output.db', ['data.csv', 'data2.csv'], drop=True)

Evolve#

See Evolve

Stats#

stats produces statistical stats about the data in the CSV files parsed.

from csvs_convert import csvs_to_sqlite
datpackage = csvs_to_sqlite(f'output.db', ['data.csv', 'data2.csv'], stats=True)

The datapackage now contains stats about the file.

A CSV version of the stats can be produced by using stats_csv which is the path to where you want the CSV to be be.

from csvs_convert import csvs_to_sqlite
datpackage = csvs_to_sqlite(f'output.db', ['data.csv', 'data2.csv'], stats_csv='/path/to/file.csv')

If stats_csv option is set the stats option is automatically set.

Threads#

threads=n makes the type checking and stats generation faster. Some statistics will not be generated howerver in threaded mode.

e.g

from csvs_convert import csvs_to_sqlite
datpackage = csvs_to_sqlite(f'output.db', ['data.csv', 'data2.csv'], stats=True, threads=8)

This will use 8 threads. Using a number near the number of cores in you computer should lead to the fastest results.